Formal and functional assessment of the pyramid method for summary content evaluation

نویسنده

  • Rebecca J. Passonneau
چکیده

Pyramid annotation makes it possible to evaluate quantitatively and qualitatively the content of machine-generated (or human) summaries. Evaluation methods must prove themselves against the same measuring stick – evaluation – as other research methods. First, a formal assessment of pyramid data from the 2003 Document Understanding Conference (DUC) is presented; this addresses whether the form of annotation is reliable and whether score results are consistent across annotators. A combination of interannotator reliability measures of the two manual annotation phases (pyramid creation and annotation of system peer summaries against pyramid models), and significance tests of the similarity of system scores from distinct annotations, produces highly reliable results. The most rigorous test consists of a comparison of peer system rankings produced from two independent sets of pyramid and peer annotations, which produce essentially the same rankings. Three years of DUC data (2003, 2005, 2006) are used to assess the reliability of the method across distinct evaluation settings: distinct systems, document sets, summary lengths, and numbers of model summaries. This functional assessment addresses the method’s ability to discriminate systems across years. Results indicate that the statistical power of the method is more than sufficient to identify statistically significant differences among systems, and that the statistical power varies little across the 3 years.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A functional model for assessing Iran's cinematic websites

Background and Objectives: Today, websites with diverse and varied uses have revolutionized all social, scientific, educational, artistic, commercial, and other fields of thought. In the meantime, the cinema has not gone away with this technological advancement, and a large number of cinema websites have been set up to help film makers in this field. Whatever the users of a website, the main pu...

متن کامل

An Evaluation Summary Method Based on a Combination of Content and Linguistic Metrics

This paper presents a new automated method for evaluating the content of a text summary. The proposed method is based on a combination of features encompassing scores of content and others of linguistic quality. This method relies on a learning technique called linear regression. The objective of this combination is to predict the PYRAMID score from the features used. In order to evaluate the p...

متن کامل

PEAK: Pyramid Evaluation via Automated Knowledge Extraction

Evaluating the selection of content in a summary is important both for human-written summaries, which can be a useful pedagogical tool for reading and writing skills, and machinegenerated summaries, which are increasingly being deployed in information management. The pyramid method assesses a summary by aggregating content units from the summaries of a wise crowd (a form of crowdsourcing). It h...

متن کامل

Validation of the Early Feeding Skills Assessment Scale for the Evaluation of Oral Feeding in Premature Infants

Background: Feeding difficulties are common and important in premature infants. In order to identify neonatal feeding difficulties, clinicians and nurses require assessment tools to conduct an objective evaluation of infant oral feeding (breast/bottle-feeding). Early identification of infants with feeding difficulty is critical to implement appropriate therapies and op...

متن کامل

Trends in Speech and Language Rehabilitation in Iran

This paper is a short review on the Jann and content of speech and language rehabilitation services and the trend of their institutionalization in Iran. A summary of formal education in speech and language therapy in Iran as originated by establishing a 4 year BS rehabilitation program in the College of Rehabilitation Sciences in Tehran in 1974 is given. Since then, speech and language Rehabili...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2010